This exploratory data analysis leverages the comprehensive insights offered by the Redfin listing dataset for Maine and New Hampshire to explore and assess the connection between various numerical and categorical variables and home prices. The Redfin dataset is a combination of information from several thousand recent home sales and the population health metrics from the BRFSS dataset localized for a given house’s neighborhood. This new dataset contains 3,015 rows and 48 informative columns that state abbreviations of questions asked to survey takers in relation to their health and recent home purchases. This includes statuses on general health, check-ups, blood pressure medication, asthma, diabetes, property type, address, city, state, zip code, price, bedroom count, bedroom count, property size, lot size, year built, days on the market, etc. The inclusion of different kinds of data, ranging from categorical indicators and numerical metrics, showcases a comprehensive picture of health dynamics and home sales across diverse geographical regions.
This analysis will primarily center on examining the categorical columns providing information on property type, state, and city, alongside the numerical columns detailing home prices, bedroom count, bathroom count, year built, days on the market, and property size (in square feet). Through descriptive analysis, linear regression modeling, and multiple regression analysis, the objective is to gain insights into various numerical and categorical variables. This involves examining distributions across different demographic categories and exploring the relationships between these variables and home prices. The preparation for analyzing this dataset included simplifying and renaming columns for comprehensive purposes, removing columns that weren’t relevant to this exploration. Rows with missing data (NAs) were removed from the dataset before each analysis question was explored, not at the beginning when the dataset was initially prepared. This approach ensures that no important data related to the specific variable being studied is lost. Additionally, new data frames and columns were created for this EDA.
Furthermore, specific questions (see below) are asked for two-sample hypothesis testing, simple linear regression predicting home prices with categorical variables, and multiple linear regression models predicting home prices using at least five independent features, facilitating further analysis. These questions will be addressed through the application of inferential statistics and hypothesis testing methodologies, as well as the utilization of linear regression models and multiple regression analysis. This approach aims to clarify population parameters, uncover potential disparities among groups, and delve deeper into the relationships between multiple variables, facilitating a comprehensive understanding of the data.
The following assignment will analyze the relationship between five numerical variables (bedroom count, bathroom count, property size, year built, and days on the market) and home prices. This assignment includes the R codes used for computing correlation analysis and linear regression models, aiding in the assessment of relationships between variables. This report aims to identify pertinent questions and to use inferential statistics to test hypotheses and implement meaningful conclusions about the population under study. There will be a summary of the dataset, the creation and calculations of new data frames and columns, and analysis of the following questions:
Two-Sample Tests
Questions:
At a significance level of 0.05:
Is there a significant difference between the average home prices of single-family residential properties located in Portland, ME and those in Bangor, ME?
Is there a significant difference between the average home prices of condo/co-op properties located in Conway, NH and those in Concord, NH?
Linear Regression Models
Questions:
At a significance level of 0.05:
Does a relationship exist between home prices and property size (sq. ft) for single-family residential properties in Portland, ME compared to those in Bangor, ME?
Does a relationship exist between home prices and bedroom quantity for condo/co-op properties in Conway, NH compared to those in Concord, NH?
Through thorough exploration and analysis of the dataset, valuable insights can be unearthed, benefiting public health programs and contributing to the well-being of communities nationwide. Rigorous examination and meticulous consideration of the dataset can result in meaningful conclusions and actionable recommendations to address the questions at hand.
The next 2 questions are categorized as two-sample tests because each one involves comparing the means of two distinct groups. In each case, there are two separate samples being compared to each other. These comparisons involve independent samples as each group being compared is distinct and unrelated to the others. Each sample is independently selected from its respective population, without any overlap or connection between the individuals in one group and those in another. For example, when comparing home prices of single-family residential properties between Portland, ME and Bangor, ME, the home prices in each region form distinct and independent groups. Similarly, when comparing home prices of condo/co-op properties between Conway, NH and Concord, NH, the home prices in each region are identified as independent samples. Overall, independent samples are utilized in these comparisons due to the distinct, unrelated nature of the groups, each selected independently from its population.
Furthermore, a confidence level of 95% is selected to solve these problems due to its balance between precision and certainty. This confidence level offers a reasonable level of certainty without excessively widening the intervals. Therefore, the questions are set at a significance level or alpha (α) of 0.05. An alpha (α) level of 0.05 indicates a 5% chance of committing a Type I error in hypothesis testing, meaning a 5% probability of rejecting the null hypothesis when it is true. This significance level is commonly adopted, representing a moderate level for accepting or rejecting hypotheses. Additionally, the t-test is employed as the test statistic due to the nature of the datasets’ sample sizes. Since information on the population size of survey takers and the population standard deviation (σ) is unavailable, utilizing the t-test is the most appropriate.
The question posed in the initial scenario is whether there is a significance difference between the average home prices of single-family residential properties in Portland, ME and those in Bangor, ME. Therefore, the two new data frames (“PriceSingleFamilyPortland” and “PriceSingleFamilyBangor”) created to help answer this question contain the variables state (“ME”), city (“Portland” or “Bangor”), property type (“Single Family Residential”), and price from the Redfin dataset. After filtering the new datasets to include solely the relevant variables and eliminating rows with missing information (NA’s), the PriceSingleFamilyPortland dataset consisted of a sample size of 20. The dataset PriceSingleFamilyBangor consisted of a sample size of 24.
For this question, the null hypothesis (𝐻0) states there is no significant difference between the average home prices of single-family residential properties in Portland, ME and those in Bangor, ME. The alternative hypothesis (𝐻1) states there is a significant difference between the average home prices of single-family residential properties in Portland, ME and those in Bangor, ME. Furthermore, the critical value and p-value are used to determine whether to reject or not reject the null hypothesis.
In the comparison of average home prices of single-family residential properties in Portland, ME and Bangor, ME, the results from the two-sample test (Table 1) indicate a statistically significant difference. The estimated average home price in Portland is $433,053.24, while in Bangor, it is $790,674.95. The test statistic is 6.35, with a p-value of less than 0.001, indicating a significant difference in home prices between the two cities. The 95% confidence interval for the difference in means ranges from $293,463.52 to $572,642.96. These findings imply that, on average, home prices in Bangor, ME are $433,053.24 higher than homes prices in Portland, ME.
Table 1 | ||||||||
|---|---|---|---|---|---|---|---|---|
Comparison of Average Home Prices of Single-Family Residential Properties located in Portland, ME and Bangor, ME: Two-Sample Test Results (α = 0.05) | ||||||||
estimate | estimate1 | estimate2 | statistic | p | parameter | Method | Alternative | 95% CI |
433,053.24 | 790,674.95 | 357,621.71 | 6.35 | < .001*** | 28.77 | Welch Two Sample t-test | two.sided | [293463.52, 572642.96] |
The following outlines the step-by-step process used to arrive at the decision to reject the null hypothesis, along with a summary of its meaning.
Question: Is there a significant difference between the average home prices of single-family residential properties in Portland, ME and those in Bangor, ME at an α of 0.05?
Step 1: Identify the Hypothesis and Claim
𝐻0: 𝜇1 - 𝜇2 = 0
𝐻1: 𝜇1 - 𝜇2 ≠ 0 (Claim)
Step 2: Find the Critical Value
The critical value is ± 2.09. Since the test is two-tailed and α = 0.05, the degree of freedom is the smaller of n1 – 1 and n2 – 1. In this case, n1 – 1 = 20 – 1 = 19 and n2 – 1 = 24 – 1 = 23. Therefore, the d.f. = 23.
Step 3: Compute Test Value
The formula below is used to calculate the test value:
𝑡=(𝑥bar1−𝑥bar2)−(𝜇1−𝜇2) / √(𝑠1^2/𝑛1)+(𝑠2^2/𝑛2)
The test value is 6.35.
Step 4: Decision to Reject or to Not Reject Null Hypothesis
Reject the null hypothesis since 6.35 > 2.09.
Step 5: Summarize The Results
Reject the null hypothesis because there is enough evidence to support the claim that there is a significant difference in the average home prices of single-family residential properties located in Portland, ME and those in Bangor, ME at an alpha of 0.05. Additionally, the p-value (0.001) also supports this conclusion because it is less than α (0.05).
In this case, the question is if there is a significant difference between the average home prices of condo/co-op properties in Conway, NH and those in Concord, NH. Therefore, the two new data frames (“PriceCondoConway” and “PriceCondoConcord”) created to help answer this question contain the variables state (“NH”), city (“Conway” or “Concord”), property type (“Condo/Co op”), and price from the Redfin dataset. After filtering the new datasets to include solely the relevant variables and eliminating rows with missing information (NA’s), the PriceCondoConway dataset consisted of a sample size of 17. The dataset PriceCondoConcord consisted of a sample size of 8.
For this question, the null hypothesis (𝐻0) states there is no significant difference between the average home prices of condo/co-op properties in Conway, NH and those in Concord, NH. The alternative hypothesis (𝐻1) states there is a significant difference between the average home prices of condo/co-op properties in Conway, NH and those in Concord, NH. Furthermore, the critical value and p-value are used to determine whether to reject or not reject the null hypothesis.
Table 2 illustrates the results of a two-sample test comparing the average home prices of condo/co-op properties in Conway, NH and Concord, NH. The estimated average home price in Conway is $382,128.68, whereas in Concord it is $705,741.18. The test statistic is 3.11, with a p-value of 0.006, indicating a significant difference in home prices between the two locations. The 95% confidence interval for the difference in means ranges from $122,885.10 to $641,372.25. These findings imply that, on average, home prices of condo/co-op properties in Concord, NH are $382,128.68 higher than prices in Conway, NH.
Table 2 | ||||||||
|---|---|---|---|---|---|---|---|---|
Comparison of Average Home Prices of Condo/Co-Op Properties located in Conway, NH and Concord, NH: Two-Sample Test Results (α = 0.05) | ||||||||
estimate | estimate1 | estimate2 | statistic | p | parameter | Method | Alternative | 95% CI |
382,128.68 | 705,741.18 | 323,612.50 | 3.11 | .006** | 16.68 | Welch Two Sample t-test | two.sided | [122885.10, 641372.25] |
The following outlines the step-by-step process used to arrive at the decision to reject the null hypothesis, along with a summary of its meaning.
Question: Is there a significant difference between the average home prices of condo/co-op properties in Conway, NH and those in Concord, NH at an α of 0.05?
Step 1: Identify the Hypothesis and Claim
𝐻0: 𝜇1 - 𝜇2 = 0
𝐻1: 𝜇1 - 𝜇2 ≠ 0 (Claim)
Step 2: Find the Critical Value
The critical value is ± 2.36. Since the test is two-tailed and α = 0.05, the degree of freedom is the smaller of n1 – 1 and n2 – 1. In this case, n1 – 1 = 17 – 1 = 16 and n2 – 1 = 8 – 1 = 7. Therefore, the d.f. = 7.
Step 3: Compute Test Value
The formula below is used to calculate the test value:
𝑡=(𝑥bar1−𝑥bar2)−(𝜇1−𝜇2) / √(𝑠1^2/𝑛1)+(𝑠2^2/𝑛2)
The test value is 3.11.
Step 4: Decision to Reject or to Not Reject Null Hypothesis
Reject the null hypothesis since 3.11 > 2.36.
Step 5: Summarize The Results
Reject the null hypothesis because there is a significant difference in the average home prices of condo/co-op properties located in Conway, NH and those in Concord, NH at an alpha of 0.05. Additionally, the p-value (0.006) also supports this conclusion because it is less than α (0.05).
As significant differences in home prices between single-family residential properties and condo/co-op properties within each distinct location in Maine and New Hampshire were established in the previous questions, the next step is to delve deeper into these distinctions based on property characteristics such as property size (sq. ft) and bedroom quantity. For the next two questions, the decision to subset the dataset rather than using a dummy variable was made strategically, considering the nature of the analysis and the specific research questions being explored. The following are additional reasons as to why subset the dataset rather than using a dummy variable:
Distinct Relationships: Subsetting the dataset allows for the examination of the relationship between home prices and property size or the number of bedrooms within each distinct location (Portland, ME and Bangor, ME for the first question, and Conway, NH and Concord, NH for the second question). Analyzing each location separately can capture any unique trends or patterns that might exist within each area. This approach provides a more comprehensive understanding of how home prices vary with property characteristics within specific geographical regions.
Avoiding Assumptions: Using a dummy variable to represent different locations implies that the relationship between home prices and property size or bedroom count is the same across all locations. However, this assumption may not be true in reality. Subsetting the dataset allows for the exploration of location-specific relationships without making this assumption. It acknowledges the possibility that the relationship between home prices and property characteristics may vary depending on the location.
Enhanced Interpretation: Subsetting the dataset can simplify the interpretation of results. Focusing on specific locations can provide clearer and more targeted insights into the relationship between home prices and property size or bedroom count within those areas in Maine or New Hampshire. This approach enhances the relevance and applicability of the findings, particularly for stakeholders interested in understanding local housing market dynamics.
Overall, the decision to subset the dataset rather than using a dummy variable reflects a more comprehensive and context-specific approach to analysis, allowing for a deeper exploration of location-specific relationships and potentially yielding more insightful findings. Additionally, a confidence level of 95% and a significance level (alpha, α) of 0.05 are chosen for the reasons mentioned earlier.
The linear regression model assessing the relationship between home prices of single-family residential properties and property size (Sq. Ft) in Portland, ME and Bangor, ME indicates a moderate positive correlation. The correlation coefficient (𝑟) value of 0.52 suggests a moderately strong linear relationship between home prices and property size. Furthermore, the coefficient of determination (𝑟^2) value of 0.325 indicates that approximately 32.5% of the variability in home prices can be explained by the variation in property size. The analysis is conducted with 43 rows from the new data frame “PriceSingleFamilyPortlandBangor.”
Additionally, as depicted in Figure 1, an upward trend is observed: as property size increases, home prices generally tend to increase, although some variability is evident. However, it’s important to note that other factors beyond property size also influence home prices. Therefore, while property size is a significant predictor, it does not fully account for the variability observed in home prices.
Table 3 | |||||
|---|---|---|---|---|---|
Linear Regression Results: | |||||
Term | estimate | std.error | statistic | p | 95% CI |
(Intercept) | 135,126.46 | 100,777.26 | 1.34 | .187 | [-68250.29, 338503.21] |
`SQUARE FEET` | 211.95 | 47.15 | 4.50 | < .001*** | [116.80, 307.10] |
## `geom_smooth()` using formula = 'y ~ x'
Linear Regression Results:
The linear regression model results (Table 3) predicting home prices based on property size (Sq. Ft.) in Portland, ME and Bangor, ME shows the following results:
Regression Intercept: The intercept term is $135,126.46, indicating the estimated home price when the property size is zero. However, this term is not statistically significant with a p-value of 0.187.
Regression Coefficient (SQUARE FEET): The coefficient for property size is $211.95, suggesting that for each additional square foot, the home price increases by approximately $211.95. This coefficient is highly significant (p < 0.001).
These findings suggest that property size (SQUARE FEET) significantly influences home prices in Portland, ME and Bangor, ME. Specifically, for each additional square foot, the estimated home price increases by approximately $211.95. However, the intercept term, indicating the estimated home price when the property size is zero, is not statistically significant. This implies that while property size is a significant predictor of home prices, the estimated home price when there are no square feet (i.e., the intercept) may not be meaningful in this context. Overall, while property size plays a significant role in determining home prices, it’s essential to consider other factors as well to obtain a comprehensive understanding of the housing market dynamics in Portland, ME and Bangor, ME.
The linear regression model analyzing the relationship between home prices of condo/co-op properties and bedroom count in Conway, NH and Concord, NH reveals a moderate positive correlation. A correlation coefficient (r) value of 0.461 suggests a moderate linear association between home prices and bedroom count. Additionally, the coefficient of determination (𝑟^2) value of 0.212 indicates that approximately 21.2% of the variability in home prices can be attributed to variations in bedroom count. The analysis is conducted with 25 rows from the new data frame “PriceCondoConwayConcord.”
As seen in Figure 2, an upward trend is observed: as the number of bedrooms increases, there tends to be a corresponding increase in home prices, though some variability exists. However, it’s important to acknowledge that factors beyond bedroom count also impact home prices. Therefore, while bedroom count serves as a significant predictor, it doesn’t fully explain the observed variability in home prices.
Table 4 | |||||
|---|---|---|---|---|---|
Linear Regression Results: | |||||
Term | estimate | std.error | statistic | p | 95% CI |
(Intercept) | -114,003.03 | 291,622.51 | -0.39 | .699 | [-717270.15, 489264.09] |
BEDS | 311,367.42 | 125,032.12 | 2.49 | .020* | [52718.78, 570016.07] |
## `geom_smooth()` using formula = 'y ~ x'
Linear Regression Results:
The linear regression model results (Table 4) predicting home prices based on the number of bedrooms in Conway, NH and Concord, NH shows the following results:
Regression Intercept: The intercept term is -$114,003.03, indicating the estimated home price when the number of bedrooms is zero. However, this term is not statistically significant with a p-value of 0.699.
Regression Coefficient (BEDS): The coefficient for the number of bedrooms is $311,367.42, suggesting that for each additional bedroom, the home price increases by approximately $311,367.42. This coefficient is statistically significant with a p-value of 0.020.
These findings indicate that the number of bedrooms has a significant influence on home prices in Conway, NH and Concord, NH. Specifically, for each additional bedroom, the estimated home price increases by approximately $311,367.42. However, the intercept term, indicating the estimated home price when there are no bedrooms (the intercept), is not statistically significant. This implies that while the number of bedrooms is a significant predictor of home prices, the estimated home price when there are no bedrooms may not be meaningful in this context. Overall, the results highlight the importance of considering the number of bedrooms when predicting home prices in Conway, NH and Concord, NH.
Building upon the earlier insights gained in this analysis, there arose a necessity to delve deeper into the collective impact of multiple independent variables (including bedroom count, bathroom count, property size, year built, and days on the market) on home prices across diverse regions like Portland, ME, Bangor, ME, Conway, NH, and Concord, NH. Hence, a multiple linear regression analysis was undertaken with a significance level (alpha) of 0.05. This analytical approach aimed to explore the relationship between various property characteristics and home prices, thereby offering valuable insights into the dynamics of the housing market.
Table 5 | |||||
|---|---|---|---|---|---|
Multiple Linear Regression Results: | |||||
Term | estimate | std.error | statistic | p | 95% CI |
(Intercept) | -6,240,591.08 | 1,622,160.12 | -3.85 | < .001*** | [-9474308.18, -3006873.99] |
BEDS | -13,423.13 | 63,712.86 | -0.21 | .834 | [-140432.39, 113586.13] |
BATHS | 61,884.33 | 75,541.14 | 0.82 | .415 | [-88704.19, 212472.84] |
`SQUARE FEET` | 237.36 | 93.37 | 2.54 | .013* | [51.23, 423.50] |
`YEAR BUILT` | 3,223.78 | 811.76 | 3.97 | < .001*** | [1605.56, 4842.01] |
`DAYS ON MARKET` | 721.44 | 421.50 | 1.71 | .091 | [-118.79, 1561.68] |
Table 5 presents the results of the multiple linear regression analysis, focusing on the relationship between home prices of single-family residential and condo/co-op properties across the specified locations. Key findings include:
Overall Model Fit: The coefficient of determination (𝑅^2) value, which measures the proportion of variance in home prices explained by the independent variables (bedroom count, bathroom count, property size (Sq. Ft.), year built, and days on the market) is 0.421201. This indicates that approximately 42.12% of the variability in home prices can be attributed to the combined effects of the predictors included in the model. This suggests that the model provides a moderate level of predictive power for explaining home prices across the specified locations.
Correlation Strength: The correlation coefficient (𝑅) value, which quantifies the strength and direction of the linear relationship between the independent and dependent variables, is 0.649. This suggests a moderately strong positive correlation between the predictors (bedroom count, bathroom count, property size (Sq. Ft.), year built, and days on the market) and home prices.
Intercept: The intercept term indicates the estimated home price when all independent variables are zero. In this case, it suggests a significant negative relationship with home prices, indicating lower prices, with a statistically significant p-value (< .001). The confidence interval suggests that the true population intercept lies between $3,006,873.99 and $9,474,308.18.
BEDS: The coefficient estimate for the number of bedrooms (BEDS) is negative, implying a decrease in home prices as the number of bedrooms increases, although it’s not statistically significant (p = .834). BATHS: The coefficient estimate for the number of bathrooms (BATHS) is positive, suggesting that an increase in the number of bathrooms is associated with higher home prices, but it’s not statistically significant (p = .415).
SQUARE FEET: The coefficient estimate for property size (SQUARE FEET) is positive and statistically significant (p = .013), indicating that larger properties tend to have higher home prices. The confidence interval suggests that for every additional square foot, home prices may increase by an amount ranging from $51.23 to $423.50.
YEAR BUILT: The coefficient estimate for the year built (YEAR BUILT) is positive and statistically significant (p < .001), indicating that newer properties tend to have higher prices. The confidence interval suggests that for each additional year a property is built, home prices may increase by an amount ranging from $1,605.56 to $4,842.01.
DAYS ON MARKET: The coefficient estimate for the number of days a property spends on the market (DAYS ON MARKET) is positive but not statistically significant (p = .091), suggesting that there may be a slight positive association between longer listing periods and higher home prices, but this relationship is not strong enough to be deemed significant.
Overall, these regression results offer valuable insights into the
factors influencing home prices across different regions. By
understanding the impact of various property characteristics,
stakeholders can make informed decisions regarding real estate
investments and market strategies.
This exploratory data analysis has provided valuable insights into the relationship between various property characteristics and home prices across Portland, ME, Bangor, ME, Conway, NH, and Concord, NH. Leveraging from the rich dataset of Redfin listings and BRFSS population health metrics, this analysis has offered a comprehensive understanding of housing market dynamics in these regions.
Through descriptive analysis, hypothesis testing, and linear regression modeling, key findings include:
Regional Disparities: Significant differences in home prices were observed between different regions, highlighting the importance of location in determining property values. For example, Bangor, ME exhibited significantly higher average home prices compared to Portland, ME, while Concord, NH showed higher prices compared to Conway, NH.
Property Characteristics: Property size (Sq. Ft.), bedroom count, bathroom count, year built, and days on the market were identified as significant predictors of home prices. Larger properties tended to command higher prices, along with newer constructions and properties with more bedrooms.
Model Insights: Multiple linear regression models provided valuable insights into the combined impact of these property characteristics on home prices. The models demonstrated moderate predictive power, explaining approximately 42.12% of the variability in home prices across the specified regions.
Despite these insights, it’s important to acknowledge the limitations of this analysis, including data quality issues, geographic constraints, and statistical assumptions. Additionally, suggestions for further improvements have been outlined to enhance the thoroughness and applicability of future analyses.
Limitations of this Analysis:
While this exploratory data analysis offers valuable insights into the relationship between various property characteristics and home prices across different regions, it’s essential to acknowledge certain limitations:
Data Quality: The accuracy and completeness of the Redfin listing dataset and the BRFSS dataset, from which the information is derived, could impact the validity of the analysis. Any missing data in these datasets might introduce biases or limitations to the findings.
Generalizability: The findings of this analysis are specific to the regions of Portland, ME, Bangor, ME, Conway, NH, and Concord, NH. Expanding these results to other locations may not be appropriate due to variations in housing market dynamics, population demographics, and other factors.
Variable Selection: The variables included in the analysis are limited to those available in the dataset. Other potentially relevant factors, such as neighborhood characteristics, proximity to amenities, or economic indicators, may also influence home prices but were not considered in this analysis.
Statistical Assumptions: The statistical tests and models used in this analysis are based on certain assumptions, such as the normality of data distributions and the independence of observations. Violations of these assumptions could affect the validity of the results.
Suggestions for Further Improvements:
To enhance the thoroughness and applicability of future analyses, the following suggestions are offered:
Data Quality Assurance: Conduct a thorough data validation process to ensure the accuracy and completeness of the datasets used for analysis. This may involve cross-referencing with reliable sources or performing data cleaning and manipulation techniques to address missing or erroneous values.
Expanded Geographic Scope: Consider expanding the geographic scope of the analysis to include a more diverse range of regions or cities. This broader perspective can provide a more comprehensive understanding of regional variations in housing market dynamics and provide more insightful information.
Incorporation of Additional Variables: Explore the inclusion of additional variables that may impact home prices, such as crime rates, school quality, transportation infrastructure, or local economic indicators. This multidimensional approach can offer a more holistic view of the factors influencing housing markets.
By addressing these limitations and implementing these improvements, future analyses can offer more robust insights into the dynamics of housing markets and contribute to informed decision-making in real estate investment and policy development.
Overall, this EDA serves as a foundational exploration into the complex relationship between property characteristics and home prices in Maine and New Hampshire. By informing stakeholders, policymakers, and real estate professionals, these findings can contribute to more informed decision-making and strategic planning in the housing market, ultimately benefiting communities and promoting sustainable development.
Bluman, A. (2018). Elementary statistics: A step by step approach (10th ed.). McGraw Hill. Goodreads. (n.d.).
Centers for Disease Control and Prevention. (2024, January 9). CDC - BRFSS. Centers for Disease Control and Prevention. https://www.cdc.gov/brfss/index.html
Kabacoff, R.I. (2022). R in action: Data analysis and graphics with R and tidyverse (3rd edition).